29 research outputs found
An Architecture for Deploying Reinforcement Learning in Industrial Environments
Industry 4.0 is driven by demands like shorter time-to-market, mass
customization of products, and batch size one production. Reinforcement
Learning (RL), a machine learning paradigm shown to possess a great potential
in improving and surpassing human level performance in numerous complex tasks,
allows coping with the mentioned demands. In this paper, we present an OPC UA
based Operational Technology (OT)-aware RL architecture, which extends the
standard RL setting, combining it with the setting of digital twins. Moreover,
we define an OPC UA information model allowing for a generalized plug-and-play
like approach for exchanging the RL agent used. In conclusion, we demonstrate
and evaluate the architecture, by creating a proof of concept. By means of
solving a toy example, we show that this architecture can be used to determine
the optimal policy using a real control system.Comment: This preprint has not undergone peer review or any post-submission
improvements or corrections. The Version of Record of this contribution is
published in Computer Aided Systems Theory - EUROCAST 2022 and is available
online at https://doi.org/10.1007/978-3-031-25312-6_6
Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting Task
This paper presents a comparison between two well-known deep Reinforcement
Learning (RL) algorithms: Deep Q-Learning (DQN) and Proximal Policy
Optimization (PPO) in a simulated production system. We utilize a Petri Net
(PN)-based simulation environment, which was previously proposed in related
work. The performance of the two algorithms is compared based on several
evaluation metrics, including average percentage of correctly assembled and
sorted products, average episode length, and percentage of successful episodes.
The results show that PPO outperforms DQN in terms of all evaluation metrics.
The study highlights the advantages of policy-based algorithms in problems with
high-dimensional state and action spaces. The study contributes to the field of
deep RL in context of production systems by providing insights into the
effectiveness of different algorithms and their suitability for different
tasks.Comment: Submitted and accepted version to the 32nd International Symposium on
Industrial Electronics (ISIE), Helsinki, Finlan
A Modular Test Bed for Reinforcement Learning Incorporation into Industrial Applications
This application paper explores the potential of using reinforcement learning
(RL) to address the demands of Industry 4.0, including shorter time-to-market,
mass customization, and batch size one production. Specifically, we present a
use case in which the task is to transport and assemble goods through a model
factory following predefined rules. Each simulation run involves placing a
specific number of goods of random color at the entry point. The objective is
to transport the goods to the assembly station, where two rivets are installed
in each product, connecting the upper part to the lower part. Following the
installation of rivets, blue products must be transported to the exit, while
green products are to be transported to storage. The study focuses on the
application of reinforcement learning techniques to address this problem and
improve the efficiency of the production process.Comment: Submitted and accepted version to the 5th International Data Science
Conference (iDSC), Krems, Austri
BMC Bioinformatics / MAESTRO - multi agent stability prediction upon point mutations
Background: Point mutations can have a strong impact on protein stability. A change in stability may subsequently lead to dysfunction and finally cause diseases. Moreover, protein engineering approaches aim to deliberately modify protein properties, where stability is a major constraint. In order to support basic research and protein design tasks, several computational tools for predicting the change in stability upon mutations have been developed. Comparative studies have shown the usefulness but also limitations of such programs. Results: We aim to contribute a novel method for predicting changes in stability upon point mutation in proteins called MAESTRO. MAESTRO is structure based and distinguishes itself from similar approaches in the following points: (i) MAESTRO implements a multi-agent machine learning system. (ii) It also provides predicted free energy change ( G) values and a corresponding prediction confidence estimation. (iii) It provides high throughput scanning for multi-point mutations where sites and types of mutation can be comprehensively controlled. (iv) Finally, the software provides a specific mode for the prediction of stabilizing disulfide bonds. The predictive power of MAESTRO for single point mutations and stabilizing disulfide bonds is comparable to similar methods. Conclusions: MAESTRO is a versatile tool in the field of stability change prediction upon point mutations. Executables for the Linux and Windows operating systems are freely available to non-commercial users from http://biwww.che.sbg.ac.at/MAESTRO.Josef Laimer, Heidi Hofer, Marko Fritz, Stefan Wegenkittl and Peter Lackne
The pLab Picturebook: Load Tests and Ultimate Load Tests, Part II: Subsequences Report
This is Part II of an exhaustive empirical study on the equidistribution and correlation properties of pseudorandom numbers. Here, we apply the test design introduced in Part I to the analysis of well-chosen subsequences, which occur from splitting a given sequence of pseudorandom numbers. Such a setup is common within parallel applications. Theory predicts extreme sensitivity of linear congruential generators and assures the robustness of inversive methods. We give striking examples of how this can affect a stochastic simulation, i.e. an empirical test. Keywords: pseudorandom number generation, empirical tests, stochastic simulation, subsequence analysis, splitting methods 1 Introduction A well known technique to achieve parallel streams of pseudorandom numbers for stochastic simulation is so called splitting into subsequences, which is also known as leapfrog or jump-ahead. A given generator producing the sequence x 0 ; x 1 ; : : : is split to k parallel streams by setting x (l) ..
A Generalized [phi]-Divergence for Asymptotically Multivariate Normal Models
I. Csiszár's (Magyar. Tud. Akad. Mat. Kutató Int. Közl8 (1963), 85-108) [phi]-divergence, which was considered independently by M. S. Ali and S. D. Silvey (J. R. Statist. Soc. Ser. B28 (1966), 131-142) gives a goodness-of-fit statistic for multinomial distributed data. We define a generalized [phi]-divergence that unifies the [phi]-divergence approach with that of C. R. Rao and S. K. Mitra ("Generalized Inverse of Matrices and Its Applications," Wiley, New York, 1971) and derive weak convergence to a [chi]2 distribution under the assumption of asymptotically multivariate normal distributed data vectors. As an example we discuss the application to the frequency count in Markov chains and thereby give a goodness-of-fit test for observations from dependent processes with finite memory.distribution of statistics hypothesis testing Markov processes: hypothesis testing (Inference from stochastic processes) asymptotic distribution theory
Gambling Tests for Pseudorandom Number Generators
This paper extends the idea of serial tests by employing a carefully selected dimension reduction which is equivalent to playing a gambling strategy in a fair coin flipping game. We apply the generalized OE-divergence for testing the hypothesis that the simulated coin is fair and memoryless. An application to Twisted GFSR generators shows the ability of our test to detect deviations from equidistribution in high dimensions. 1 Introduction Numerous tests have been suggested for the empirical quality assessment of pseudorandom number generators (PRNGs), see [5] for an introduction. The standard battery of tests including serial (relative frequency based) tests for overlapping and non-overlapping tuples, and run (permutation based) tests, has recently been extended towards random-walk simulation, [3,13--17]. We somewhat follow this direction and consider a gambling policy in a simple fair coin flipping game. The formulation in terms of gambling being transparent and evident, our test also..
Benefits from Variational Regularization in Language Models
Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks